**Name: Faizan Nazir**

**Arid No: 19-ARID-5157**

**Program: BSCS7A**

**Course: Parallel and Distributed Computing**

**Lab Task: 03**

**Teacher Name: Ms. Sadia Zar**

**Q: Consider a memory system with a level 1 cache of 32 KB and DRAM of 512 MB with the processor operating at 1 GHz. The latency to L1 cache is one cycle and the latency to DRAM is 100 cycles. In each memory cycle, the processor fetches four words (cache line size is four words). What is the peak achievable performance of a dot product of two vectors? Note: Where necessary, assume an optimal cache placement policy.**

**1 /\* dot product loop \*/**

**2 for (i = 0; i < dim; i++)**

**3 dot\_prod += a[i] \* b[i];**

**Ans:**

Latency of cache = 1 cycle

Latency of DRAM = 100 cycles

Processor speed = 1GHZ

1 access to memory takes 100/(1x10^9) seconds =100ns

1 access to cache takes 1/(1x10^9) seconds = 1ns

Each iteration involve 2 operation (+= and \*) = 2

the processor fetches four words 4

FLOPS = 4 \* 2 = 8 FLOPS

In first iteration cache misses 2 time 1 for a[] and 1 for b[]

2\*100 = 200ns

Performance = 8 FLOPS / 200ns

= 8 / (200 \* 10^(-9))

= 8 \* 10^(9) / 200

**= 40\*10^6**

**= 40 MFLOPS**